1 Some Baseball Data

We have two data files, Batting.csv (95195 lines of text) and Master.csv (17916 lines of text).

Batting.csv (partial)

Master.csv (partial)

2 Analysis Goals

3 Load the Data and Create the Relations

4 The Beginning Portion of Relation B

5 This Achieved the Same Result as in Hive Case Study

6 Another Way to Create the Relations

7 The Beginning Portion of Relation B

8 Find the Highest Run for Each Year

9 The Beginning Portion of Relation D, MAX(run) by Year

10 Find The Highest Run for Each Year with Player ID

11 Find The Highest Run for Each Year with Player ID (Beginning)

12 Find The Highest Run for Each Year with Player ID (End)

13 Just Show the Three Columns: Player_id, Year and MaxRun

14 Just Show the Three Columns: Player_id, Year and MaxRun

15 Load Data from Master.csv

16 A Relation Was Created Successfully

17 The 2nd Goal

18 The Pig Script

19 The Result

20 The 3rd Goal

21 The Pig Script to Find the All-Year Max Run

22 The All Year Highest Run

23 Pig Script to Locate the Player Name and Year of the All-Year Max Run

24 The Result

25 The 4th Goal

26 The Pig Script to Find Year Average Run

27 The Year Average Run

28 The 5th Goal

29 The Script to Calculate All-Year Average Run

30 The All-Year Average Run